The B3 Hackathon brought together computer scientists and ecologists
from a variety of institutions to rapidly create novel informatics
solutions to the biodiversity challenges facing the planet. We
identified that the addition of time-weighting to the R package
“rasterdiv” would be a worthwhile contribution to the
environmental informatics community. rasterdiv was created
to calculate diversity indices with data of the class “raster layer”.
Biodiversity indexes commonly focus on the spatial component. Here we
outline how our extension to the pre-existing implementation of Rao’s
diversity indices (Rocchini, Marcantonio, and
Ricotta 2017) can account for the temporal dimension of data,
alongside the relevant biological context to our extension.
Ecosystems with heterogeneous biota have been shown both experimentally and theoretically to provide greater utility to all the agents which comprise that ecosystem (Zhang et al. 2022). This is through the provision of more resources and a greater variety of niches available for species. This subsequently increases the value of ecosystem services provided to the people in communities surrounding an ecosystem. Heterogeneous ecosystems are typically also more resilient to disturbances they experience, likely because they have functional redundancy (Mace, Norris, and Fitter 2012). Due to the centrality of biodiversity to healthy ecosystem functioning, quantitative measures of biodiversity are required to understand how ecosystems are responding to ongoing environmental changes, such as shifting land use patterns and climate change.
Novel satellite remote sensing tools often generate large amounts of data about the Earth’s surface, due to their almost constant temporal coverage and increasingly precise spatial resolution (Frazier and Hemingway 2021). The big data generated by these systems need to be interpreted effectively to accurately detect and describe ecosystem trends, such as a change in ecosystem diversity. Consequently, information theory has been used to build the analytical tools which are now regularly used to assess remote sensing datasets. For example, Shannon’s H value has been widely used as a proxy for biodiversity, but can be inadequate when applied to the new kinds of data generated by remote sensing platforms (e.g. images from Earth observation satellites). To create data from ecosystems which can be scientifically interpreted, most analytical approaches would assess discrete points within the ecosystem, such as those from a quadrat or transect. A limitation of Shannon’s H value is that it does not consider the distance between each sampled point (whether they are species incidences, pixels, or any other quantitative abstractions of an observation). This approach treats all objects within a dataset as equally distant from one another, thus measures of Shannon’s H value are prone to saturation even when only marginal differences are observed within the objects in a remote sensing dataset.
Rao’s Quadratic Diversity Index (Rao’s Q) adds space as a trait to
its abstraction of biodiversity by accounting for the distance between
observations within a study site. As a spatially informed alternative to
Shannon’s H, Rao’s Q has been demonstrated experimentally to offer
greater efficacy when representing biodiversity in aerial remote sensing
datasets (Rocchini et al. 2021), for which
pixels are the discrete observation units. However, Rao’s Q remains
limited by its inability to assess trait change over time. Current
implementations of the index (for example in the rasterdiv
R package (Rocchini et al. 2021)) only
assess one snapshot of the data at a time. We set out to overcome this
limitation by incorporating Time-Weighted Dynamic Time Warping (TWDTW)
to include time as a component of the distance variable within Rao’s
Q.
Dynamic Time Warping (DTW) is a mathematical approach used to compare data series when the timing of observations differs. It has been used in a variety of disciplines. DTW works by finding the smallest distance between two time series.
However, by flattening the differences in timing, biologically significant differences can also be obscured, such as when comparing plant phenology. For instance, many tree species require a minimum number of Growing Degree Hours (GDH) to commence their springtime budburst (Fu et al. 2019). Other ecosystem processes typically need to coincide with phenological events, so phenology timing represents an important differentiating factor for time series representing ecosystems with plants.
The TWDTW approach rectifies this by including a cost to aligning pixels with greater temporal separation. Therefore, the TWDTW function is less likely to match the time series to others which exhibit substantially different phenologies. This has been successfully demonstrated by (V. Maus et al. 2016) to classify changing land use patterns in the Brazilian Amazon, and was a more effective tool than standard DTW when applied to heterogeneous biological environments like these.
\[ \omega_{i,j} = \frac{1}{1 + e^{-\alpha(g(t_i,t_j) - \beta)}} \]
Reproduced from Maus (V. Maus et al. 2016). In addition to the standard cost matrix of the DTW function, they also propose the equation above to implement a temporal cost. In the equation α is the steepness of the logistic function used for penalisation of time distance, and β is the midpoint of the curve. Lastly, \(g(t_i,t_j)\) represents the time elapsed between the dates evaluated in the match (\(t_i\) and \(t_j\) times of the “\(i\)”th and “\(j\)”th observations).
In this manuscript, we used Sentinel 2’s optical aerial remote sensing data of a small, grazed grassland site in Calabria, Italy to demonstrate and evaluate our R-based implementation of phenology into Rao’s Q index. We also evaluate its efficacy in comparison to Shannon’s H and unmodified Rao’s Q indices.
rasterdiv:We implemented this method within the existing paRao()
function of the rasterdiv R package. We used the
twtwd function from the twdtw R package (Victor Maus 2023). This package is a wrapper
for a C++ implementation of TWDTW.
The Rao’s index with twdtw distance calculated over a time-series of imageries can thus be derived using the following R function:
paRao(x=time.series, time_vector=time, window=11, alpha=1, na.tolerance=0, method="multidimension", dist_m="twdtw", simplify=4, np=8)
The arguments and our input parameters of which are:
x An (X,Y,Z) raster stack (or cube) of
spectral data, where the X and Y axes represent discrete pixel values,
and each layer of the Z axis is a a different temporal snapshot of the
raster layer. In our study, this is the Sentinel derived time series of
our study site in Calabria.
time_vector A vector of dates corresponding to every
point in the raster time series, which must be the same as the
Z axis from the x variable. All pixels in the
input time series must share the same temporal spacing as the temporal
pattern to which it is being compared (i.e. if the time series has
observations on days c(1, 3, 7, ...), then the pattern it
is being compared to must also have observations on days
c(1, 3, 7, ...).
steepness A continuous numeric value corresponding to
the α variable from the time-weighting function in Maus (V. Maus et al. 2016). Different values of α can
increase or decrease penalisation for deviations from the pattern
time.
midpoint A numeric value corresponding to the β variable
from the time-weighting function in Maus (V. Maus
et al. 2016). The input data must be of the unit specified by the
time_scale argument (e.g. in our example it is expressed in
days).
cycle_length This argument indicates the length of a
single cycle. This argument can accept either a string or numeric value.
Valid string input values are “year”, “month”, “day”, “hour”, “minute”,
and “second”. String inputs are also passed to the
time_scale argument. Numeric input values must be the same
unit specified by the time_scale argument.
time_scale This argument sets the units of the TWDTW
function’s cycle length. Valid string input values are “year”, “month”,
“day”, “hour”, “minute”, and “second”. If the input given to
cycle_length is a string value, then this argument will
change the units given in the output. Alternatively, if a numeric value
is given to cycle_length, then this argument will compute
the elapsed time in seconds.
Other arguments remain unchanged from (Victor Maus 2023).
The study site was a small (5 hectare) patch within the Macchia Sacra Special Protection Area. It was selected as it was suitable for thorough imaging by drone, which formed the basis of our ground-truthed Biodiversity observations. With the expertise in classification imparted by an expert botanist, we defined 8 types of plant communities within the study site (Figure 1). The area is characterized by the presence of a road on the north-east part of the site. From the level of the road the elevation declines to a lower part which features a sharp canyon running south to west, the result of a previous small stream which had dried up by the time of our drone survey. This part of the study site is characterized by hydrophilic vegetation. Between these two extremes is a small hill which culminates in a plateau. The plateau is the resting area of a herd of cows which graze in the area. This area is much dryer and subject to strong pasture pressure and mechanical disruption, but is more nutrient which, due to the presence of cow manure.
We used 144 Sentinel 2 images from HRVPP of Plant Phenological Index (PPI) covering all images taken during 2023. The dimensions of each image were 20 by 27 pixels. The PPI index was chosen as it is minimally influenced by soil signal and the presence of shadows (Karkauskaite, Tagesson, and Fensholt 2017). Using these data, we applied 3 analytical approaches to measure biodiversity: The Shannon’s Biodiversity index applied to the mean yearly value with 3 significant digits of the PPI trajectory; the Rao’s Q index with different values of α, applied to the same dataset; and the Rao’s Q index with our implementation of the TWDTW function across the full time series of 144 images. A simple visual inspection of Figure 3 illustrates the unsuitability of Shannon’s H index, as when using a 3 pixel wide moving window, all pixels exhibited different mean biodiversity values and it was impossible to classify the ecosystem into different groups. The standard Rao’s Q index correctly identified the main biodiversity hotspot as the plateau atop the hill, and a secondary hotspot where the road intersects with the study site. We observed that Rao’s Q index does not change, changing alpha given that all pixels are different. Finally, our new implementation of distance resulted in two meaningful differences from the standard Rao’s Q index: the road is no longer a secondary hotspot, and the main biodiversity hotspot moved to the borders between two of the communities identified by our expert.
In this hackathon, we developed a streamlined method for implementing
the TWDTW algorithm to calculate Rao’s quadratic entropy index within
the rasterdiv R package. This addition introduces a
temporal dimension to the traditional spatial analysis of landscape
diversity. Recognising the dynamic nature of plant communities and
ecosystems over time, our method integrates phenological variation into
diversity assessments derived from satellite imagery. Notably, our case
study found that when this technique was applied to multiband remotely
sensed data from disturbed grasslands, that accounting for phenological
cycles can refine diversity indices by filtering out artefacts. For
instance, it can help to distinguish between semi-natural habitats and
artificial land cover types, like roads, which lack temporal
phenological shifts. These artificial features tend to form clusters of
minimal DTW distances when considering DTW as an inter-voxel distance,
leading to lower Rao’s index values.
Remote sensing via Earth observation remains a rapidly developing area of scientific research with wide applicability to conservation practice (Pettorelli, Safi, and Turner 2014). As the technologies continue to mature, the possibilities remote sensing approaches like ours offer continues to grow. For instance, Schulte to Bühne et al (Schulte to Bühne et al. 2022) recently demonstrated the power of Earth observation satellites to assess rewilding efforts at the landscape scale. Rewilding programmes are notoriously challenging technically, and are often expensive, too. Thus, the ability to assess the progress of rewilding inexpensively at the landscape scale via satellite is eminently valuable. However, as Maus et al (V. Maus et al. 2016) observed, markedly different land use types, such as soy bean plantations and primary rainforest, can be erroneously classified as the same via an NDVI based classification protocol which does not consider phenology. Lopes et al (Lopes et al. 2020) also found that accounting for the effects of phenology significantly increased the accuracy of land use classifications. As rewilding and reforestation programmes continue, our implementation of TWDTW reduces the barrier for conservationists trying to assess diversity temporally in addition to spatially. Our case study further demonstrated that incorporation of phenological data enhances a the inferential power of analyses, as incorporation of the temporal dimension more accurately identified biodiversity hotspots than Rao’s Q index alone (Figure 3). Since increasing biodiversity is typically a core goal of rewilding programmes, of which land cover diversity is a core component (Skidmore et al. 2015), more accurate classification of landscape diversity can highlight where efforts are succeeding or where further efforts are needed.
By incorporating temporal dynamics into the rasterdiv R
package, we broaden the scope for analysing remotely sensed time series.
This advancement enriches the suite of diversity indices obtainable from
remote sensing data, enhancing our understanding of landscape
heterogeneity, and in so doing, enhancing our ability to conserve
it.
This manuscript, previous revisions, open source data, and scripts can all be found on the open source GitHub repository “Samuel-Green/B-3-Hackathon-Project-6” via the URL: https://github.com/Samuel-Green/B-3-Hackathon-Project-6
We thank:
Dr. Quentin Groom of Meise Botanic Garden for co-ordinating the project.
Laura Abraham of Meise Botanic Garden for organising the event.
Prof. Nicodemo Passalacqua (University of Calabria) who kindly shared the ground-truthing data within pilot project 2.3.2 of the Tech4you PNRR project.
The European Union’s Horizon Europe Research and Innovation Programme (ID No 101059592) for funding the B3 programme, and thus, this event.